Towards a Better Understanding of Discourse: Integrating Multiple Discourse Annotation Perspectives Using UIMA
نویسندگان
چکیده
There exist various different discourse annotation schemes that vary both in the perspectives of discourse structure considered and the granularity of textual units that are annotated. Comparison and integration of multiple schemes have the potential to provide enhanced information. However, the differing formats of corpora and tools that contain or produce such schemes can be a barrier to their integration. U-Compare is a graphical, UIMA-based workflow construction platform for combining interoperable natural language processing (NLP) resources, without the need for programming skills. In this paper, we present an extension of U-Compare that allows the easy comparison, integration and visualisation of resources that contain or output annotations based on multiple discourse annotation schemes. The extension works by allowing the construction of parallel subworkflows for each scheme within a single U-Compare workflow. The different types of discourse annotations produced by each sub-workflow can be either merged or visualised side-by-side for comparison. We demonstrate this new functionality by using it to compare annotations belonging to two different approaches to discourse analysis, namely discourse relations and functional discourse annotations. Integrating these different annotation types within an interoperable environment allows us to study the correlations between different types of discourse and report on the new insights that this allows us to discover. ∗The authors have contributed equally to the development of this work and production of the manuscript.
منابع مشابه
Characterizing Online Discussion Using Coarse Discourse Sequences
In this work, we present a novel method for classifying comments in online discussions into a set of coarse discourse acts towards the goal of better understanding discussions at scale. To facilitate this study, we devise a categorization of coarse discourse acts designed to encompass general online discussion and allow for easy annotation by crowd workers. We collect and release a corpus of ov...
متن کاملTowards an Annotated Corpus of Discourse Relations in Hindi
We describe our initial efforts towards developing a large-scale corpus of Hindi texts annotated with discourse relations. Adopting the lexically grounded approach of the Penn Discourse Treebank (PDTB), we present a preliminary analysis of discourse connectives in a small corpus. We describe how discourse connectives are represented in the sentence-level dependency annotation in Hindi, and disc...
متن کاملNecessities of Developing Diverse Cultural Potentials in Academic Discourse
The absolute hegemony of international code of (academic) communication has resulted in the development and spread of the discoursal voice of the culture form which historical English has emerged, and, as a consequence, any violation from the generic conventions and thinking patterns born out of such a discourse has resulted in the deprivation of non-native thinkers form active participation in...
متن کاملDiscourse Analysis of Iranian Intellectuals about Law on the Edge of Constitutional Revolution (Study of Mirza Malcom Khan and Mostashar al-Dowleh)
About 110 years ago, Iran experienced a revolution, which was known to be an attempt for development and modernization. The revolutionary roots had been established by new-coming social forces named as intellectuals. Since studying the past highlights the future, the necessity of intellectuals’ discourse as well as understanding their discourse strategies on the road to development and modernit...
متن کاملThe CLaC Discourse Parser at CoNLL-2015
This paper describes our submission (kosseim15) to the CoNLL-2015 shared task on shallow discourse parsing. We used the UIMA framework to develop our parser and used ClearTK to add machine learning functionality to the UIMA framework. Overall, our parser achieves a result of 17.3 F1 on the identification of discourse relations on the blind CoNLL-2015 test set, ranking in sixth place.
متن کامل